Application of k Nearest Neighbor on Feature Projections Classi er to Text Categorization
نویسنده
چکیده
This paper presents the results of the application of an instance based learning algorithm k Nearest Neighbor Method on Fea ture Projections k NNFP to text categorization and compares it with k Nearest Neighbor Classi er k NN k NNFP is similar to k NN ex cept it nds the nearest neighbors according to each feature separately Then it combines these predictions using a majority voting This prop erty causes k NNFP to eliminate possible adverse e ects of irrelevant features on the classi cation accuracy Experimental evidence indicates that k NNFP is superior to k NN in terms of classi cation accuracy in the presence of irrelevant features in many real world domains
منابع مشابه
Application of k - Nearest Neighbor on FeatureProjections Classi er to Text
This paper presents the results of the application of an instance-based learning algorithm k-Nearest Neighbor Method on Feature Projections (k-NNFP) to text categorization and compares it with k-Nearest Neighbor Classiier (k-NN). k-NNFP is similar to k-NN except it nds the nearest neighbors according to each feature separately. Then it combines these predictions using a majority voting. This pr...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملA Comparative Study on Feature Selection in Text Categorization
This paper is a comparative study of feature selection methods in statistical learning of text categorization The focus is on aggres sive dimensionality reduction Five meth ods were evaluated including term selection based on document frequency DF informa tion gain IG mutual information MI a test CHI and term strength TS We found IG and CHI most e ective in our ex periments Using IG thresholdin...
متن کاملNatural Language Text Classification and Filtering with Trigrams and Evolutionary Nearest Neighbour Classifiers
N grams o er fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classi er. 91% accuracy is found on binary classi cation on short multi-author technical English documents. This falls if more cat...
متن کامل